A Software Toolkit for Statistical Data Analysis

نویسندگان

  • A. Pfeiffer
  • A. Ribon
  • P. Viarengo
چکیده

We present a project in progress to develop a software toolkit for statistical data analysis. The toolkit is based on advanced software technologies, integrating generic programming techniques with object oriented methods, and adopts a rigorous software process, to ensure a high quality of the product. Thanks to the component-based architecture and the usage of the standard AIDA interfaces, this tool can be easily used by other data analysis systems or integrated in experimental frameworks. The initial component of the system addresses goodness of fit tests; its applications include the comparisons of data distributions in a variety of use cases typical of HEP experiments: regression testing (in various phases of the software life-cycle), validation of simulation through comparison to experimental data, comparison of expected versus reconstructed distributions, comparison of different experimental distributions or of experimental with respect to theoretical ones in physics analysis, monitoring detector behavior with respect to a reference in online DAQ. The system will provide the user the option to choose among a wide set of goodness-of-fit tests (chi-squared, KolmogorovSmirnov, Anderson-Darling, Lilliefors, Kuiper, Cramer-von Mises, etc.), specialised for various types of binned and unbinned distributions. Its flexible design makes it open to further extension to implement other tests. This system would represent a significant improvement with respect to the current availability of comparison tests in HEP libraries, limited to the chi-squared and Kolmogorov-Smirnov algorithms. We present the architecture of the toolkit, the detailed design of the basic statistical testing component and preliminary results of its application, in particular concerning the physics validation of the Geant4 Simulation Toolkit. We discuss the openness of the project, welcoming contributions from experts and user requirements from experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Istituto Nazionale Di Fisica Nucleare

The present project aims to develop an open-source and object-oriented software Toolkit for statistical data analysis. Its statistical testing component contains a variety of Goodness-of-Fittests, from Chi-squared to Kolmogorov-Smirnov, tolessknown, but generally much more powerful tests such a sAnderson-Darling, Goodman, Fisz-Cramer-vonMises, Kuiper, Tiku. Thanks to the component-based design ...

متن کامل

An R package to analyse LC/MS metabolomic data: MAIT (Metabolite Automatic Identification Toolkit)

UNLABELLED Current tools for liquid chromatography and mass spectrometry for metabolomic data cover a limited number of processing steps, whereas online tools are hard to use in a programmable fashion. This article introduces the Metabolite Automatic Identification Toolkit (MAIT) package, which makes it possible for users to perform metabolomic end-to-end liquid chromatography and mass spectrom...

متن کامل

استانداردهای آرشیوی، در نرم‌افزارهای دسترسی آزاد و پیشنهاد نرم‌افزار مناسب برای مراکز آرشیوی داخلی

The purpose of this study is Study of Descriptive Metadata Standards in Archival open source software, to determine the most appropriate descriptive metadata standard (s) and also Encoder Software support of these standards. The approach of present study is combination and library methods, Delphi and descriptive survey are used. Data gathering in library study is fiche, in the Delphi method is ...

متن کامل

Lium Spkdiarization: an Open Source Toolkit for Diarization

This paper presents an open-source diarization toolkit which is mostly dedicated to speaker and developed by the LIUM. This toolkit includes hierarchical agglomerative clustering methods using well-known measures such as BIC and CLR. Two applications for which the toolkit has been used are presented: one is for broadcast news using the ESTER 2 data and the other is for telephone conversations u...

متن کامل

A Toolkit for Statistical Comparison of Data Distributions

A typical problem associated to Monte Carlo developments and application consists in the validation of the simulation models and results against experimental data. A novel software toolkit has been developed encompassing an ample variety of statistical algorithms for the comparison of data distributions, such as Monte Carlo simulations and experimental data. The toolkit contains a variety of go...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004